Bayesian multistudy factor analysis for high-throughput biological data

نویسندگان

چکیده

This paper analyzes breast cancer gene expression across seven studies to identify genuine and thus replicable patterns shared among these studies. Our premise is that biological signal more likely be reproducibly present in multiple than spurious signal. analysis uses a new modeling strategy for the joint of high-throughput which simultaneously identifies as well study-specific To this end, we generalize multi-study factor model handle high-dimensional data sparse Bayesian infinite context. We provide strategies identification loading matrices, common study-specific. Through extensive simulation analysis, characterize performance proposed approach various scenarios show it outperforms standard identifying all considered. The clear patterns. These are related well-known pathways involved cancer, such ER, cell cycle, immune system, collagen, metabolic pathways. Some also associated with existing subtypes, LumA, Her2, basal while other novel active subtypes missed by hierarchical clustering approaches. R package MSFA implementing method available on GitHub.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data

1Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan 2Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan 3Institute of Systems Biology and Bioinformatics, National Central University, Taoyuan 320, Taiwan 4Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan 5Graduate Institute of...

متن کامل

Statistical Methods for High-Throughput Biological Data

The explosion in DNA microarray technology in the last decade has given rise to extensive biological data in the form of expression profiles of tens of thousands of genes and proteins, often from only a handful of tissue samples. The principal objective of a high-throughput experiment can be generally characterized as one of class comparison, class prediction or molecular pattern discovery. Cla...

متن کامل

Pathway analysis of high-throughput biological data within a Bayesian network framework

MOTIVATION Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to cau...

متن کامل

Integrative Modeling and Analysis of High-throughput Biological Data

Computational biology is an interdisplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. I...

متن کامل

Computational Methods for Learning Bayesian Networks from High-Throughput Biological Data

Data from high-throughput technologies, such as gene expression microarrays, promise to yield insight into the nature of the cellular processes that have been disrupted by disease, thus improving our understanding of the disease and hastening the discovery of effective new treatments. Most of the analysis thus far has focused on identifying differential measurements, which form the basis of bio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Annals of Applied Statistics

سال: 2021

ISSN: ['1941-7330', '1932-6157']

DOI: https://doi.org/10.1214/21-aoas1456